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Abstract: The Critical Assessment of 5mall Molecule Identification, or CASMI, contest 
was founded in 2012 to provide scientists with a common open dataset to evaluate their 
identification methods. In this article, the challenges and solutions for the inaugural CASMI 
2012 are presented. The contest was split into four categories corresponding with tasks to 
determine molecular formula and molecular structure, each from two measurement types, 
liquid chromatography-high resolution mass spectrometry (LC-HRMS), where preference 
was given to high mass accuracy data, and gas chromatography-electron impact-mass 
spectrometry (GC-MS), i.e., unit accuracy data. These challenges were obtained from plant 
material, environmental samples and reference standards. It was surprisingly difficult to 
obtain data suitable for a contest, especially for GC-MS data where existing databases are 
very large. The level of difficulty of the challenges is thus quite varied. In this article, the 
challenges and the answers are discussed, and recommendations for challenge selection in 
subsequent CASMI contests are given. 

Keywords: mass spectrometry; metabolite identification; small molecule identification; 
contest; metabolomics; non-target identification 
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1. Introduction 

The CASMI contest, the Critical Assessment of £mall Molecule Identification, was founded in 2012. 
The aim of CASMI [1] was to encourage experts to exhibit their identification methods on a common 
dataset and, thus, enable a better comparison of the methods available. The task was to determine 
the molecular formula and/or the molecular structure from the mass spectrometry data. The myriad of 
options available for small molecule identification (vendor software, specialized independent software, 
open access and open source options) makes it increasingly difficult for users and researchers alike to 
keep pace with the changes. Instead, offering a common dataset enables the use of expert knowledge 
or any chosen identification methods and provides a basis for comparison. The aim of CASMI was to 
include all disciplines interested in small molecule identification and, thus, enable the cross-disciplinary 
exchange of information and expertise. In this article, small molecule identification refers to molecules 
of approximately 50-1,000 Da that can be detected with mass spectrometric (MS) techniques. 

Although MS identification methods are often categorized according to the chromatographic 
separation used (e.g., gas chromatography (GC) versus liquid chromatography (LC)), with relatively 
recent instrumental developments, such as high resolution and soft-ionization GC-MS, it is difficult 
to distinguish separation, detection and identification techniques and set distinct categories for a 
competition to allow a broad range of participants. The inaugural CASMI focused on two measurement 
types, liquid chromatography-high resolution mass spectrometry (LC-HRMS), where preference was 
given to high mass accuracy MS/MS data, and gas chromatography-electron impact-mass spectrometry 
(GC-MS), focusing on unit mass accuracy data. Although this excluded some participants, e.g., those 
with only unit mass accuracy LC-MS/MS data experience and those with high mass accuracy GC-MS 
data, these categories could be considered for future CASMI contests. 

The data collection commenced in the early months of 2012, with the original aim of 20 challenges 
per category. The 'unknowns' could not be truly unknown for the purpose of the competition and, thus, 
required a confirmed identity. However, this made it difficult to obtain suitable data, especially for 
the GC-MS data where many of the challenges available were also in common databases. In the end, 
challenges were obtained from plant material, environmental samples and reference standards. As it was 
difficult to find GC-MS challenges that were not in the NIST database [2], challenges were provided 
that only had a relatively low probability (<60%) in the database search, although a couple with high 
probability (>90%) were added to give some variety. This compromise meant that the level of difficulty 
of the challenges was quite varied. Despite this compromise, however, the initial target of 20 suitable 
challenges for each category was not achieved. In the GC-MS dataset, the final 16 challenges were 
all confirmed with reference standards, and although other substances were available in the samples 
provided, they did not have matching standards. For the LC-HRMS data, it was difficult to obtain 
identified "unknowns", as those already published could be linked to the names and/or institutes of the 
organizers, while those unpublished were often intended for a forum other than CASMI. This finally 
resulted in 14 LC-HRMS challenges. Six of the challenges were part of pathway elucidation efforts to 
determine gene function during investigations into the biochemistry of natural products and their role 
in the development and defenses of plants. The remaining eight environmental substances were taken 
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from "failed confirmations" (which were not suitable for publication alone) and one successful target 
identification of a rare compound, not yet published. 

In retrospect, given the small number of participants for the first CASMI and the fact that not all 
participants contributed to all challenges within a category, the number of challenges seemed appropriate. 
Although a smaller number of challenges may have encouraged more participants, a larger number of 
challenges is needed to provide sufficient variety in the difficulty and chemical diversity of the challenges 
and to allow a proper evaluation. The disadvantage of providing many challenges is that it creates an 
advantage for fully automated entries, as methods requiring the input of human expertise are generally 
more time-consuming. 

In this article, the challenges and the answers are discussed, along with recommendations for 
challenge selection in subsequent CASMI contests. Details about the participants and the outcome of 
the first contest can be found in [3], also in this special issue. 

2. LC-HRMS Challenges and Solutions (Category 1 and 2) 

The LC-HRMS challenges were sourced from plant material and standards purchased for 
confirmation of unknowns in environmental investigations. All challenges contained the elements C, 
H, N, O, P and S; no halogens were present in any compound; see Table 1 for an overview. Appendix A 
contains annotated spectra for each challenge, which show the composite spectrum of all available 
MS/MS files for each challenge and, thus, display the most intense peak where a given peak occurred in 
multiple spectra within the error window of 0.0001 Da plus 5 ppm. The MS spectra were also included 
for certain challenges. The fragments were annotated using ACD ChemSketch [4] and Mass Frontier [5] 
and processed automatically using OpenBabel [6] and a script in R [7] to determine placement. Although 
the most realistic fragments were selected, many of these are tentative and have not been confirmed 
unambiguously. Appendix B provides more details on the challenge compounds, including PubChem 
and ChemSpider identifiers. 

As it proved difficult to obtain suitable challenge compounds for this contest, there was no 'easy vs. 
hard' pre-selection. In the end, this meant some of the challenges were quite challenging, while others 
were too easy. Although challenges that were in reference databases were avoided as far as possible, 
some compounds were uploaded to MassBank [8] after the challenge data was released. 

2.7. LC-HRMS Challenges 1 to 6 

The first six challenges were metabolites that were encountered as part of plant metabolomics 
research. The compounds were measured on a Bruker micrOTOF-Q equipped with an electrospray 
ionization (ESI) source in positive mode, which generally achieves <5 ppm mass accuracy and 12,000 
resolution during routine measurements. At this resolution, the extraction of the isotopic fine structure 
(which would resolve, e.g., the 15 N or 34 S isotope peaks) is not possible, but the isotope intensities 
are generally very accurate. The data was acquired with a 3 Hz scan frequency for both MS and 
MS/MS acquisitions. 
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Table 1. Liquid chromatography (LC) Challenges for the Critical Assessment of .Small 
Molecule Identification (CASMI) 2012. 



Challenge 


Trivial Name 


Formula 


Exact mass 


1 


Kanamycin A 


C 18 H 36 N 4 O n 


484.2381 


2 


1,2-Bis-O-sinapoyl-beta-D-glucoside 


C28H32O14 


592.1792 


3 


Glucolesquerellin 


C 14 H 27 N0 9 S 3 


449.0848 


4 


Escholtzine 


C 19 H 17 N0 4 


323.1158 


5 


Reticuline 


C 19 H 23 N0 4 


329.1627 


6 


Rheadine 


C 21 H 21 N0 6 


383.1369 


10 


1 - Aminoanthraquinone 


C 14 H 9 N0 2 


223.0633 


11 


1 -Pyrenemethanol 


C 17 H 12 0 


232.0888 


12 


alpha-(o-Nitro-p-tolylazo)acetoacetanilide 


C 17 H 16 N 4 0 4 


340.1172 


13 


Benzyldiphenylphosphine oxide 


C 19 H 17 OP 


292.1017 


14 


1 H-Benz [g] indole 


C 12 H 9 N 


167.0735 


15 


l-Isopropyl-5-methyl-lH-indole-2,3-dione 


C 12 H 13 N0 2 


203.0946 


16 


[ 1 -(4-methoxy anilino)- 1 -oxopropan-2-yl] 


C 18 H 21 N 3 0 5 


359.1481 




6-oxo- 1 -propylpyridazine-3-carboxylate 






17 


Nitrin 


C 13 H 13 N 3 


211.1109 



Challenge 1 was kanamycin A (C 18 H 36 N 4 O n ), an aminoglycoside compound with antibiotic effects 
from bacteria. The compound was available as an authentic standard. The challenge data comprised 
the full-scan data, including two isotope peaks and fragment-rich MS/MS spectra at 10 eV, 20 eV and 
30 eV in positive mode, shown in Figure Al. The MS/MS of the three collision energies were acquired 
in consecutive scans, which reduced the effective scan frequency for one collision energy to 1 Hz. The 
LC-HRMS/MS data was processed with the XCMS cent Wave feature detection [9], and the compound 
spectrum was extracted with CAMERA [10]. This approach is described in greater detail in [1 1]. 

Challenge 2 was l,2-bis-0-sinapoyl-/3-D-glucoside (C 28 H320 14 ), which was extracted from canola 
seeds and characterized previously [12]. The challenge data in negative mode included isotopes up 
to (M + 3) and a single fragment-rich MS/MS spectrum, shown in Figure A2, which was also extracted 
with XCMS and CAMERA, as described above. The raw data provided initially was affected by a 
severe calibration problem, resulting in ^30 ppm mass deviation. The data, recalibrated to within 5 ppm 
accuracy, was provided to the participants after the contest closed, to offer them a chance to recalculate 
their results on more accurate data for the special issue. 

Challenge 3 was glucolesquerellin (C 14 H 27 N0 9 S 3 ), found with other glucosinolates in the seeds 
of Brassicacae. Among others, the glucosinolates 3-methylthiopropyl (3MTP, glucoibervirin), 
4-methylthiobutyl (4MTB, glucoerucin), 7-methylthioheptyl (7MTH) and 8-methylthiooctyl (8MTO) 
are described in [13]. The challenge data was measured from a methanolic extract of Arabidopsis 
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thaliana seeds, in negative mode. Although no authentic standard was used, the confidence in the 
identification was quite high based on the molecular formula determined with high mass accuracy data, 
characteristic product ions and the consistency of the structural information (including retention time) 
with other glucosinolates of different chain lengths. Isotopes were present up to (M + 4). The MS/MS 
spectra (see Figure A3) were extracted with XCMS and CAMERA (as described above) and did not 
contain the precursor ion for collision energies above 20 eV. 

Challenges 4-6 were combined into a single sample and measured together in positive mode. As for 
Challenge 1, the collision energy was alternated in the raw file, but in contrast to the previous challenges, 
the MS/MS data was extracted from a single scan for each compound and collision energy. All peaks 
below an intensity of 1% of the base peak were removed. The spectra are given in Figures A4-A6. 
The data provided originally was not calibrated and had mass deviations up to 8 ppm. After the closing 
of the contest, the data was recalibrated and provided to the participants. This resulted in deviations 
below 5 ppm for Challenges 4 and 6, but at the same time, increased the mass error for Challenge 5 to 
approximately 6 ppm. 

Challenge 4 was the alkaloid escholtzine (C 19 H 17 N0 4 ). The isotopic pattern included only peaks up 
to (M + 2), while the 30 eV MS/MS spectrum was very noisy. 

Challenge 5 was another alkaloid, reticuline (C 19 H 23 N0 4 ). While the 20 eV MS/MS spectrum still 
contains the precursor, the 30 eV spectrum contains a few additional fragments below m/z 176. 

Challenge 6 was the alkaloid rheadine (C 21 H 21 N0 6 ). The MS/MS spectra contained more fragments 
than the previous challenge. 

As all of these compounds were in PubChem, they could be considered "known unknowns". 
Challenges 7 to 9 are absent; as discussed above, the original aim of 20 challenges was not attained, 
and the original numbering was kept in this article for consistency with the participant results and 
publications. 

2.2. LC-HRMS Challenges 10 to 17 

These challenges resulted from unconfirmed tentative identifications arising from the effect-directed 
analysis (EDA) of river water sampled from the Elbe (Czech Republic) using the passive sampler, 
blue rayon [14], where CASMI provided some 'use' for standards that otherwise had no specific 
purpose. As a result, some of these are quite challenging challenges, whereas others are more 
straightforward. All these challenges were taken from measurements of reference standards, using either 
ESI or atmospheric pressure chemical ionization (APCI) techniques; the fragmentation modes were 
either collision-induced dissociation (CID) or higher-energy collisional dissociation (HCD); the settings 
reported are as normalized collision energies (NCE). 

Challenge 10 was 1-aminoanthraquinone, shown in Figure A7. Although amino groups are usually a 
distinctive loss in many compounds, here, the first losses are a water from a carbonyl group (m/z 206), 
resulting in a rearrangement to form a stabilizing four-membered ring with the amino-substituent, as 
well as the loss of the full carbonyl group itself (m/z 196). The loss of a full benzyl group results 
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in the fragment at m/z 146, likely also stabilized by the formation of a four-membered ring; while the 
remaining fragment at m/z 105.033 is likely to result from the loss of the same benzyl ring along with one 
of the carbonyl groups, where the charge remains with the smaller fragment. The accurate mass of the 
fragment confirms the formula C 7 H 5 0, rather than, e.g., a nitrogen adduct (m/z 105.044), such as those 
seen in [15]. 

Challenge 11 was 1-pyrenemethanol and had a difficult MS spectrum to interpret, although the very 
simple fragmentation pattern was also informative. Both the MS and MS/MS are plotted in Figure A8. 
The behavior of substances can be a lot less consistent with APCI and atmospheric pressure photo 
ionization (APPI) compared with ESI, and this substance undergoes an in-source loss and oxidation to 
an [M — H] + ion. The only losses are the hydroxy group and the complete methanol substituent. The fact 
that no other fragments are generated despite targeted MS/MS on the m/z 215 peak indicates that a stable 
aromatic backbone is likely to be present. In-source oxidation has been reported previously, for example 
in [16], the isobars, tonalide and galaxolide, could not be separated chromatographically, but could be 
identified using their different ionization behavior in positive mode. Tonalide was visible as both [M] + * 
and [M + H] + , whereas galaxolide was detected as [M — H] + (an in-source oxidation product) and 
the [M] + * ion. The authors explained this with differing proton affinities, demonstrating that galaxolide 
has a lower proton affinity than the proton donors in the APPI source and, thus, competed unfavorably for 
the protons. 

Challenge 12 was a-(o-nitro-p-tolylazo)acetoacetanilide, commonly known as "Pigment Yellow 1" 
and was a target compound identified only through site- specific information [14]. This challenge would 
be difficult for de novo structure elucidation, as it is quite a big molecule and has a wide variety of 
functional groups. The many functional groups also make it difficult to incorporate predictive selection 
strategies. Even with knowledge of the true structure, it was difficult to annotate all the major MS/MS 
peaks using either simple bond-breaking approaches or the general and library fragmentation rules in 
Mass Frontier [5]. The major annotations are shown in Figure A9. It is likely that a much more detailed 
elucidation of the fragmentation processes would be needed to annotate all peaks, which was beyond the 
scope of this article. 

Challenge 13 was benzyldiphenylphosphine oxide and was one of the easier challenges, for database 
searching and structure generation alike, when taking the spectrum into account. The only "degree of 
freedom" was the location of the CH 2 or CH 3 group (i.e., whether a benzyl or methylphenyl substituent 
was present). The spectrum of this compound and similar compounds were uploaded to MassBank [8,17] 
before the submission deadline. The major fragments are shown in Figure A10. 

Challenge 14 was lH-benz[g]indole, another stable, aromatic compound. Although the spectra 
(shown together in Figure All) display more fragments than Challenge 1 1, the collision energy is much 
greater here (HCD 120 and 180 NCE, compared with CID at 35 NCE above). The fragments at m/z 167 
and m/z 168 are potentially a mix of [M] + * and [M + H] + , with a H loss as the first major fragment. 
The remaining fragments are successive two-member losses from the aromatic system; first, CNH 
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followed by two C 2 H X losses. The fragments given in Figure All are indicative; a rearrangement may 
stabilize the fragments at m/z 141 and m/z 91. 

Challenge 15 was l-isopropyl-5-methyl-lH-indole-2,3-dione and has quite a small, 
aromatic- stabilized system with a distinctive isopropyl loss in the MS/MS spectrum, followed 
again by the break-up of the aromatic system (see Figure A 12). The presence of m/z 91 indicates that 
the methyl group is attached to a benzene ring; m/z 106 indicates also that the N is attached to the same 
benzene ring. The carbonyl groups again display a loss of water, as well as the full substituent. 

Challenge 16 was [l-(4-methoxyanilino)-l-oxopropan-2-yl] 6-oxo-l-propylpyridazine-3-carboxylate. 
This challenge was a candidate for an unknown identification, where the original unknown remains 
unidentified. This compound experiences significant fragmentation, such that neither the molecular 
ion nor any adducts of the molecular ion are present in the MS. The MS and MS/MS are merged in 
the spectrum displayed in Figure A13. Energy-based fragmentation scoring (as in, e.g., MetFrag [18]) 
can prioritize the wrong compounds, such as here, where the fragmentation was too favorable. Thus, 
the presence in a compound database does not necessarily mean that the compound is conducive to 
identification via MS/MS analysis. 

Challenge 17 is nitrin, another unconfirmed tentative identification where the presence of a 
C 6 H 5 (N = N) + fragment in the original unknown spectrum led to the (incorrect) tentative identification 
of nitrin. The peak instead arose from a nitrogen adduct formed during MS/MS measurements, a 
phenomenon observed with several aromatic compounds (e.g., [15,19]). One result of the adduct 
detection was the expansion of the fragment formula annotation option in RMassBank to include adducts 
by adding N 2 and O to the allowed elements of the subformulas [15]. The spectrum of Challenge 17 is 
shown in Figure A14. The fragment at m/z 105.044 corresponding to a C 6 H 5 (N = N) + fragment is 
conspicuously absent in the MS/MS spectrum of this compound. Instead, fragmentation occurs between 
the Ns, and only a few pieces of the molecule are observed. Interestingly, the fragment at m/z 77 
(characteristic for a phenyl substituent) was very small, confirming that fragmentation occurs preferably 
between the Ns. 

3. GC-MS Challenges and Solutions (Category 3 and 4) 

All GC-MS challenges are summarized in Table 2 and were sourced from real environmental samples 
and were confirmed with reference standards. This requirement of being certain of the identity (for the 
purpose of a contest), but also not being too easy to find in a database, was a big challenge for the 
GC-MS data, as over 200,000 compounds are now included in GC-MS databases, such as NIST [2]. As 
a result, challenges were selected where the probability for a database match was relatively low, i.e., 
not a 'straightforward' identification. Many of these are quite standard compounds, but the spectra were 
taken from real samples (instead of the database) to add some variety. A couple of isomers were chosen 
to see if computational methods could match the ability of databases to distinguish isomers. A couple 
of challenges that did not meet this 'low probability' requirement were added to diversify the challenge 
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set further. There were a lot more halogens (chlorine only) present in these spectra compared with the 
LC-HRMS challenges. 

As no external participants participated in these categories, these challenges are not described in 
detail. The structures and several identifiers are given in Appendix C, Figure CI 8. 

Table 2. Gas chromatography (GC) Challenges for CASMI 2012. 



i^naiienge 


inviai iMame 


r ormuia 


iNoiuiiiai mass 


i 


Phthalic anhydride 


C 8 H 4 0 3 


148 


2 


Phthalimide 


C 8 H 5 N0 2 


147 


3 


2-Chlorobenzyl alcohol 


C 7 H 7 C10 


142 


4 


4-Chlorobenzyl alcohol 


C 7 H 7 C10 


142 


5 


1 ,4-Dichlorobenzene 


C 6 H 4 C1 2 


146 


6 


Acenaphthene 




154 


7 


4-Chlorobenzoic acid 


C 7 H 5 C10 2 


156 


8 


Fluorene 


Ci3H 10 


166 


9 


Methyl 2-chlorobenzoate 


C 8 H 7 C10 2 


170 


10 


2,4,6-Trichlorophenol 


C 6 H 3 C1 3 0 


196 


11 


Formothion 


C 6 H 12 N0 4 PS 2 


257 


12 


alpha-Hexachlorocyclohexane 


C 6 H 6 C1 6 


290 


13 


Dimethyl carbonotrithioate 


C 3 H 6 S 3 


138 


14 


0,0,0-Trimethyl 


C 3 H 9 O s PS 


156 




thiophosphate 






15 


Dibenzofuran 


C 12 H g O 


168 


16 


0,S,S-Trimethyl 


C 3 HqPS 2 0 2 


172 




phosphorodithioate 







3.1. GC-MS Challenges 1 and 2 

These two challenges were chosen due to the availability of standards for retention index (RI) 
calculation [20]. These were very closely related; only an O and NH are different. 

3.2. GC-MS Challenges 3 to 16 

Challenges 3-16 came from the EDA of a groundwater sample from Bitterfeld, Germany [21,22]. 
Fractionation using reverse-phase high performance liquid chromatography (RP-HPLC) with a CI 8 
column and preparative GC (pcGC) was performed prior to the final GC-MS analysis (for more details, 
see [21]). As a result, partitioning information could be calculated for the individual fractions, and 
this provided additional information for the identification, which was made available to the CASMI 
participants. The compounds identified in the sample are quite common environmental contaminants 
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that could have resulted in almost trivial identification results for participants with access to a large 
GC-MS database. 

4. Recommendation for Future CASMIs 

The problem of insufficient spectra and, especially for GC, too many spectra in the databases, could 
be improved in future CASMIs by sourcing compounds from a synthetic laboratory, which would be 
able to provide rare compounds, but also confirm their identity. 'Unknown unknowns' are not suitable 
for a competition, as the identity must be known to declare the winner(s). 

Due to the lack of participants in the GC-MS categories, the organizers of the next CASMI may 
consider adding a different category to complement the accurate mass LC-HRMS categories. Some 
possibilities include an accurate mass GC category, or GCxGC-TOF, or changing the focus to different 
MS/MS ionization techniques, rather than forming distinct GC and LC categories. It is also plausible that 
only two categories should be offered in the next CASMI, i.e., restricting the competition to Categories 
1 and 2 only. Another enhancement to the LC-HRMS categories could be the inclusion of challenges 
measured along with a set of standard compounds to provide reference retention times, or providing 
participants with candidate lists that they would need to rank. 

One way to improve participation in future CASMIs could be to provide additional incentives, such 
as prizes. The opportunity to submit papers to a special issue does not appear to have been sufficient 
incentive to attract many participants in the 2012 contest. Although sponsorship would be an option, 
it can compromise the independence (or at least, the appearance of independence) of the competition. 
An alternative, more scientific incentive could be the organization of a CASMI identification workshop, 
which would require more participants to be successful. 

CASMI could also provide the ideal exchange platform for selected 'unknown unknowns' in the 
future, where scientists could submit their unknowns and offer other experts (and expert systems) the 
chance to identify them. Obviously, no winners can be declared when the answer is unknown; the 
contributor of the 'unknown unknown' would be required to decide the appropriate 'reward'. 
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Appendix 

A. Annotated spectra 

This appendix contains the annotated spectra for LC-MS Challenges 1-17. The structures were 
determined with the help of experience, ChemSketch [4] and MassFrontier [5]. Selected fragments were 
added to a script in R [7], while the processing of the spectra, including placement of the fragments, was 
automatic. OpenBabel [6] was used to generate the images. 

Figure Al. Challenge 1: annotated merged MS and MS/MS spectra of kanamycin A 
(electrospray ionization (ESI), positive mode). 
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Figure A2. Challenge 2: annotated MS/MS spectrum of l,2-bis-0-sinapoyl-/5-D-glucoside 
(ESI, negative mode). 
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Figure A3. Challenge 3: annotated merged MS/MS spectra of glucolesquerellin (ESI, 
negative mode). 
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Figure A4. Challenge 4: annotated merged MS/MS spectra of escholtzine (ESI, 
positive mode). 
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Figure A5. Challenge 5: annotated merged MS/MS spectra of reticuline (ESI, 
positive mode). 
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Figure A6. Challenge 6: annotated merged MS/MS spectra of rheadine (ESI, positive mode). 
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Figure A7. Challenge 10: annotated MS/MS spectrum of 1-aminoanthraquinone (ESI, 
positive mode). 




Figure A8. Challenge 11: annotated merged MS and MS/MS spectra of 1-pyrenemethanol 
(APCI, positive mode). 
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Figure A9. Challenge 12: annotated merged MS and MS/MS spectra of "Pigment Yellow 1" 
(APCI, positive mode). 




Figure A10. Challenge 13: annotated merged MS/MS spectra of benzyldiphenylphosphine 
oxide (ESI, positive mode). 
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Figure All. Challenge 14: annotated merged MS/MS spectra of lH-benz[g] indole (APCI, 
positive mode). 
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Figure A12. Challenge 15: annotated merged MS and MS/MS spectra of 
l-isopropyl-5-methyl-lH-indole-2,3-dione (APCI, positive mode). 
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Figure A13. Challenge 16: annotated merged MS and MS/MS spectra of 
[ 1 -(4-methoxy anilino)- 1 -oxopropan-2-yl] 6-oxo- 1 -propylpyridazine-3-carboxylate ( APCI, 
positive mode). The [M+H] + ion was not observed in the measured spectra. 
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Figure A14. Challenge 17: annotated merged MS and MS/MS spectra of nitrin (ESI, positive 
mode). 
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B. Structures for the LC-HRMS challenges (Categories 1 and 2) 

Figures B15 to B17, contain the structures and identifiers for the LC-HRMS challenges, Categories 1 
and 2. 

Figure B15. Structures and identifiers for LC-HRMS Challenges 1-4. 



Challenge 1 
Kanamycin A 
C18H36N4O11 
PubChem: 6032 
ChemSpider: 5810 




Challenge 2 

1,2-Bis-O-sinapoyl-beta-D- 
glucoside 

C28H32O14 

PubChem: 5280665 
ChemSpider: 4444262 




Challenge 3 
Glucolesquerellin 
C14H27NO9S3 
PubChem: 46173875 
ChemSpider: NA 




Challenge 4 
Escholtzine 
C19H17NO4 

PubChem: 12304178 
ChemSpider: 16740500 
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Figure B16. Structures and identifiers for LC-HRMS Challenges 5-13. 



Challenge 5 
Reticuline 
Ci 9 H 23 N04 
PubChem: 10233 
ChemSpider: 9816 



H,C 




Challenge 6 
Rheadine 
C 2 iH 21 N0 6 
PubChem: 197775 
ChemSpider: 171184 



H.C-Q H 




Challenge 10 
1-Aminoanthraquinone 
Ci 4 H 9 N0 2 
PubChem: 6710 
ChemSpider: 6454 



O NH 5 




Challenge 11 
1-Pyrenemethanol 

Ci 7 H 12 0 

PubChem: 104977 
ChemSpider: 94729 




Challenge 12 
alpha-(o-Nitro-p-tolyl 
azo)acetoacetanilide 
C17H16N4O4 
PubChem: 221491 
ChemSpider: 192174 




Challenge 13 

Benzyl-diphenyl phosphine 

oxide 

C19H17OP 

PubChem: 76293 

ChemSpider: 68772 
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Figure B17. Structures and identifiers for LC-HRMS Challenges 14-17. 



Challenge 14 




lH-Benz[g]indole 




C 12 H 9 N 




PubChem: 98617 




ChemSpider: 89061 









Challenge 15 

l-lsopropyl-5-methyl-lH- 

indole-2,3-dione 

Ci 2 H 13 N0 2 

PubChem: 2145522 

ChemSpider: 1606080 



HoC 




H,C 



Challenge 16 

l-[(4-Methoxyphenyl) amino]- 

l-oxo-2-propanyl 6-oxo-l- 

propyl-l,6-dihydro-3- 

pyridazinecarboxylate 

C18H21N3O5 

PubChem: 18091616 

ChemSpider: 16896706 



HoC 




Challenge 17 
Nitrin 

C13H13N3 

PubChem: 68380 
ChemSpider: 61666 




C. Structures for GC-MS Challenges 

Figure CI 8 contains the structures and identifiers for the GC-MS challenges, Categories 3 and 4. 
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Figure C18. Structures and identifiers for GC-MS Challenges 1-16. 



Challenge 1 
Phthalic anhydride 
C 8 H 4 0 3 

PubChem: 6811 
ChemSpider: 6552 



Challenge 3 
2-Chlorobenzyl 
alcohol 
C 7 H 7 CIO 

PubChem: 28810 
ChemSpider: 26799 
Challenge 5 
1,4-Dichlorobenzene 
C6H4CI2 

PubChem: 4685 
ChemSpider: 

13866817 

Challenge 7 
4-Chlorobenzoic acid 
C 7 H 5 CI0 2 
PubChem: 6318 
ChemSpider: 6079 
Challenge 9 
Methyl 2- 
chlorobenzoate 
C 8 H 7 CI0 2 
PubChem: 11895 
ChemSpider: 11402 

Challenge 11 
Formothion 
C 6 H 12 N0 4 PS 2 
PubChem: 17345 
ChemSpider: 16412 



Challenge 13 
Dimethyl 

carbonotrithioate 

C3H 6 S 3 

PubChem: 16840 
ChemSpider: 15959 
Challenge 15 
Dibenzofuran 
C 12 H 8 0 

PubChem: 568 
ChemSpider: 551 



O 




H 3 C— S 



Challenge 2 
Phthalimide 
C 8 H 5 N0 2 
PubChem: 6809 
ChemSpider: 6550 



Challenge 4 
4-Chlorobenzyl 
alcohol 
C 7 H 7 CIO 

PubChem: 13397 
ChemSpider: 12823 
Challenge 6 
Acenaphthene 

Ci 2 H 10 

PubChem: 6734 
ChemSpider: 6478 



Challenge 8 
Fluorene 

C13H10 

PubChem: 6853 
ChemSpider: 6592 
Challenge 10 
2,4,6-Trichlorophenol 
C 6 H 3 CI 3 0 
PubChem: 6914 
ChemSpider: 
21106172 

Challenge 12 

alpha-Hexachloro- 

cyclohexane 

PubChem: 727 

ChemSpider: 

10468511 



Challenge 14 
0,0,0-Trimethyl 
thiophosphate 
C3H9O3PS 
PubChem: 9038 
ChemSpider: 8686 



PubChem: 31435 
ChemSpider: 29165 



O 




CH 3 



Challenge 16 
0,S,S-Trimethyl 
phosphorodithioate 
C 3 H 9 PS 2 0 2 
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